NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Accurate prediction of nucleic acid binding proteins using protein language model

https://doi.org/10.1093/bioadv/vbaf008

Wu, Siwen; Xu, Jinbo; Guo, Jun-tao (December 2024, Bioinformatics Advances)
Bateman, Alex (Ed.)
Abstract MotivationNucleic acid binding proteins (NABPs) play critical roles in various and essential biological processes. Many machine learning-based methods have been developed to predict different types of NABPs. However, most of these studies have limited applications in predicting the types of NABPs for any given protein with unknown functions, due to several factors such as dataset construction, prediction scope and features used for training and testing. In addition, single-stranded DNA binding proteins (DBP) (SSBs) have not been extensively investigated for identifying novel SSBs from proteins with unknown functions. ResultsTo improve prediction accuracy of different types of NABPs for any given protein, we developed hierarchical and multi-class models with machine learning-based methods and a feature extracted from protein language model ESM2. Our results show that by combining the feature from ESM2 and machine learning methods, we can achieve high prediction accuracy up to 95% for each stage in the hierarchical approach, and 85% for overall prediction accuracy from the multi-class approach. More importantly, besides the much improved prediction of other types of NABPs, the models can be used to accurately predict single-stranded DBPs, which is underexplored. Availability and implementationThe datasets and code can be found at https://figshare.com/projects/Prediction_of_nucleic_acid_binding_proteins_using_protein_language_model/211555.
more » « less
Full Text Available
A Statistical Study of the Decreased TEC Region During Summer at Northern High Latitudes

https://doi.org/10.1029/2025gl114676

Wang, Jian‐ping; Zhang, Bei‐chen; Zhang, Qing‐yu; Zhang, Shun‐rong; Zhang, Qing‐he; Wang, Guo‐jun; Liu, Jian‐jun (April 2025, Geophysical Research Letters)

Abstract Based on the vertical Total Electron Content (TEC) data observed by the Global Navigation Satellite System in the northern hemisphere, a large area of low plasma density during summer at high latitudes, termed decreased TEC region, was investigated statistically between 2014 and 2024. Compared with the classical depleted structures that usually occur in the nighttime F region at high latitudes during winter, decreased TEC region is usually found in the sunlit polar cap ionosphere during summer. The decreased TEC region is predominantly located in regions above 70° magnetic latitude for moderate and high solar activity. The lower‐TEC region is biased towards the dawn and midnight sectors. Along the 18:25–06:25 Magnetic Local Time meridian, the depth of the decreased TEC region reached 7.6TECu in 2014. The decreased TEC region is deeper for higher Kp (Kp > 2) than for low Kp (Kp ≤ 2).
more » « less
Full Text Available
Improved prediction of DNA and RNA binding proteins with deep learning models

https://doi.org/10.1093/bib/bbae285

Wu, Siwen; Guo, Jun-tao (May 2024, Briefings in Bioinformatics)

Abstract Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play important roles in essential biological processes. To facilitate functional annotation and accurate prediction of different types of NABPs, many machine learning-based computational approaches have been developed. However, the datasets used for training and testing as well as the prediction scopes in these studies have limited their applications. In this paper, we developed new strategies to overcome these limitations by generating more accurate and robust datasets and developing deep learning-based methods including both hierarchical and multi-class approaches to predict the types of NABPs for any given protein. The deep learning models employ two layers of convolutional neural network and one layer of long short-term memory. Our approaches outperform existing DBP and RBP predictors with a balanced prediction between DBPs and RBPs, and are more practically useful in identifying novel NABPs. The multi-class approach greatly improves the prediction accuracy of DBPs and RBPs, especially for the DBPs with ~12% improvement. Moreover, we explored the prediction accuracy of single-stranded DNA binding proteins and their effect on the overall prediction accuracy of NABP predictions.
more » « less
Full Text Available
Iron nanoparticle/carbon nanotube composite as oxidase-like nanozyme for visual analysis of total antioxidant capacity

https://doi.org/10.1016/j.fochx.2024.102093

Liu, Junlin; Xie, Sophia; Wang, Nan; Sun, Zhongyue Sun; Tang, Lina; Zhang, Guo-jun; Tressel, John; Zhang, Yulin; Sun, Yujie; Chen, Shaowei (December 2024, Food chemistry)

Full Text Available
DNA binding and transposition activity of the Sleeping Beauty transposase: role of structural stability of the primary DNA-binding domain

https://doi.org/10.1093/nar/gkae1188

Ranjan, Venkatesh V; Leighton, Gage O; Yan, Chenbo; Arango, Maria; Lustig, Janna; Corona, Rosario I; Guo, Jun-Tao; Nesmelov, Yuri E; Ivics, Zoltán; Nesmelova, Irina V (January 2025, Nucleic Acids Research)

Abstract DNA transposons have emerged as promising tools in both gene therapy and functional genomics. In particular, the Sleeping Beauty (SB) DNA transposon has advanced into clinical trials due to its ability to stably integrate DNA sequences of choice into eukaryotic genomes. The efficiency of the DNA transposon system depends on the interaction between the transposon DNA and the transposase enzyme that facilitates gene transfer. In this study, we assess the DNA-binding capabilities of variants of the SB transposase and demonstrate that the structural stability of the primary DNA-recognition subdomain, PAI, affects SB DNA-binding affinity and transposition activity. This fundamental understanding of the structure–function relationship of the SB transposase will assist the design of improved transposases for gene therapy applications.
more » « less
Full Text Available
Platinum nanowires/MXene nanosheets/porous carbon ternary nanocomposites for in situ monitoring of dopamine released from neuronal cells

https://doi.org/10.1016/j.talanta.2024.126496

Xiao, Xueqian; Ni, Wei; Yang, Yang; Chen, Qinhua; Zhang, Yulin; Sun, Yujie; Liu, Qiming; Zhang, Guo-jun; Yao, Qunfeng; Chen, Shaowei (October 2024, Talanta)

Full Text Available
Nanozymes for the Therapeutic Treatment of Diabetic Foot Ulcers

https://doi.org/10.1021/acsbiomaterials.4c00470

Xiao, Xueqian; Zhao, Fei; DuBois, Davida Briana; Liu, Qiming; Zhang, Yu Lin; Yao, Qunfeng; Zhang, Guo-Jun; Chen, Shaowei (July 2024, ACS Biomaterials Science & Engineering)

Full Text Available
Two-stage error detection to improve electron microscopy image mosaicking

https://doi.org/10.1016/j.compbiomed.2024.108456

Shi, Jiahao; Ge, Hongyu; Wang, Shuohong; Wei, Donglai; Yang, Jiancheng; Cheng, Ao; Schalek, Richard; Guo, Jun; Lichtman, Jeff; Wang, Lirong; et al (June 2024, Computers in Biology and Medicine)

Full Text Available
Ancient gene clusters initiate monoterpenoid indole alkaloid biosynthesis and C3 stereochemistry inversion

https://doi.org/10.1101/2025.01.07.631695

Hwang, Jaewook; Kirshner, Jonathan; Deschênes, Daniel_André Ramey; Richardson, Matthew Bailey; Fleck, Steven J; Guo, Jun; Perley, Jacob Owen; Shahsavarani, Mohammadamin; Garza-Garcia, Jorge_Jonathan Oswaldo; Seveck, Alyssa Dawn; et al (January 2025, bioRxiv)

Abstract The inversion of C3 stereochemistry in monoterpenoid indole alkaloids (MIAs), derived from the central precursor strictosidine (3S), is essential for synthesizing numerous 3RMIAs and oxindoles, including the antihypertensive drug reserpine found inRauvolfia serpentina(Indian snakeroot) andRauvolfia tetraphylla(devil pepper) of the plant family Apocynaceae. MIA biosynthesis begins with the reduction of strictosidine aglycone by various reductases, preserving the initial 3Sstereochemistry. In this study, we identify and biochemically characterize a conserved oxidase-reductase pair from the Apocynaceae, Rubiaceae, and Gelsemiaceae families of the order Gentianales: the heteroyohimbine/yohimbine/corynanthe C3-oxidase (HYC3O) and C3-reductase (HYC3R). These enzymes collaboratively invert the 3Sstereochemistry to 3Racross a range of substrates, resolving the long-standing question about the origin of 3RMIAs and oxindole derivatives, and facilitation of reserpine biosynthesis. Notably,HYC3OandHYC3Rare located within gene clusters in both theR. tetraphyllaandCatharanthus roseus(Madagascar periwinkle) genomes, which are partially homologous to an elusive geissoschizine synthase (GS) gene cluster we also identified in these species. InR. tetraphylla, these clusters occur closely in tandem on a single chromosome, likely stemming from a single segmental duplication event, while inC. roseus, a closely related member of rauvolfioid Apocynaceae, they were later separated by a chromosomal translocation. The ancestral genomic context for both clusters can be traced all the way back to common ancestry with grapevine. Given the presence of syntenic GS homologs inMitragyna speciosa(Rubiaceae), the GS cluster, at least in part, probably evolved at the base of the Gentianales, which split from other core eudicots up to 135 million years ago. We also show that the strictosidine biosynthetic gene cluster, required to initiate the MIA pathway, plausibly evolved concurrently. The reserpine biosynthetic cluster likely arose much later in the rauvolfioid lineage of Apocynaceae. Collectively, our work uncovers the genomic and biochemical basis for key events in MIA evolution and diversification, providing insights beyond the well-characterized vinblastine and ajmaline biosynthetic pathways.
more » « less
Full Text Available
Pt,P-codoped carbon nitride nanoenzymes for fluorescence and colorimetric dual-mode detection of cholesterol

https://doi.org/10.1016/j.aca.2024.342351

Chen, Meiling; Yang, Yang; Chen, Qinhua; Tang, Lina; Liu, Junlin; Sun, Yujie; Liu, Qiming; Zhang, Yulin; Zhang, Guo-jun; Chen, Shaowei (April 2024, Analytica Chimica Acta)

Full Text Available

« Prev Next »

Search for: All records